AITopics | tatsuya kawahara

Collaborating Authors

tatsuya kawahara

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Does the Appearance of Autonomous Conversational Robots Affect User Spoken Behaviors in Real-World Conference Interactions?

Pang, Zi Haur, Fu, Yahui, Lala, Divesh, Elmers, Mikey, Inoue, Koji, Kawahara, Tatsuya

arXiv.org Artificial IntelligenceMar-17-2025

We investigate the impact of robot appearance on users' spoken behavior during real-world interactions by comparing a human-like android, ERICA, with a less anthropomorphic humanoid, TELECO. Analyzing data from 42 participants at SIGDIAL 2024, we extracted linguistic features such as disfluencies and syntactic complexity from conversation transcripts. The results showed moderate effect sizes, suggesting that participants produced fewer disfluencies and employed more complex syntax when interacting with ERICA. Further analysis involving training classification models like Na\"ive Bayes, which achieved an F1-score of 71.60\%, and conducting feature importance analysis, highlighted the significant role of disfluencies and syntactic complexity in interactions with robots of varying human-like appearances. Discussing these findings within the frameworks of cognitive load and Communication Accommodation Theory, we conclude that designing robots to elicit more structured and fluent user speech can enhance their communicative alignment with humans.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2503.13625

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.07)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.05)
Asia > Middle East > Jordan (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.68)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.48)
(2 more...)

Add feedback

Human-Like Embodied AI Interviewer: Employing Android ERICA in Real International Conference

Pang, Zi Haur, Fu, Yahui, Lala, Divesh, Elmers, Mikey, Inoue, Koji, Kawahara, Tatsuya

arXiv.org Artificial IntelligenceDec-13-2024

This paper introduces the human-like embodied AI interviewer which integrates android robots equipped with advanced conversational capabilities, including attentive listening, conversational repairs, and user fluency adaptation. Moreover, it can analyze and present results post-interview. We conducted a real-world case study at SIGDIAL 2024 with 42 participants, of whom 69% reported positive experiences. This study demonstrated the system's effectiveness in conducting interviews just like a human and marked the first employment of such a system at an international conference. The demonstration video is available at https://youtu.be/jCuw9g99KuE.

artificial intelligence, large language model, natural language, (21 more...)

arXiv.org Artificial Intelligence

2412.09867

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre:

Research Report (1.00)
Personal > Interview (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
(3 more...)

Add feedback

Yeah, Un, Oh: Continuous and Real-time Backchannel Prediction with Fine-tuning of Voice Activity Projection

Inoue, Koji, Lala, Divesh, Skantze, Gabriel, Kawahara, Tatsuya

arXiv.org Artificial IntelligenceOct-21-2024

In human conversations, short backchannel utterances such as "yeah" and "oh" play a crucial role in facilitating smooth and engaging dialogue. These backchannels signal attentiveness and understanding without interrupting the speaker, making their accurate prediction essential for creating more natural conversational agents. This paper proposes a novel method for real-time, continuous backchannel prediction using a fine-tuned Voice Activity Projection (VAP) model. While existing approaches have relied on turn-based or artificially balanced datasets, our approach predicts both the timing and type of backchannels in a continuous and frame-wise manner on unbalanced, real-world datasets. We first pre-train the VAP model on a general dialogue corpus to capture conversational dynamics and then fine-tune it on a specialized dataset focused on backchannel behavior. Experimental results demonstrate that our model outperforms baseline methods in both timing and type prediction tasks, achieving robust performance in real-time environments. This research offers a promising step toward more responsive and human-like dialogue systems, with implications for interactive spoken dialogue applications such as virtual assistants and robots.

backchannel, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.15929

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Sweden (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.68)

Add feedback

Analysis and Detection of Differences in Spoken User Behaviors between Autonomous and Wizard-of-Oz Systems

Elmers, Mikey, Inoue, Koji, Lala, Divesh, Ochi, Keiko, Kawahara, Tatsuya

arXiv.org Artificial IntelligenceOct-4-2024

This study examined users' behavioral differences in a large corpus of Japanese human-robot interactions, comparing interactions between a tele-operated robot and an autonomous dialogue system. We analyzed user spoken behaviors in both attentive listening and job interview dialogue scenarios. Results revealed significant differences in metrics such as speech length, speaking rate, fillers, backchannels, disfluencies, and laughter between operator-controlled and autonomous conditions. Furthermore, we developed predictive models to distinguish between operator and autonomous system conditions. Our models demonstrated higher accuracy and precision compared to the baseline model, with several models also achieving a higher F1 score than the baseline.

backchannel, job interview scenario, scenario, (13 more...)

arXiv.org Artificial Intelligence

2410.03147

Country:

North America > United States > Pennsylvania (0.04)
Asia > Singapore (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Genre: Research Report > Experimental Study (0.88)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

An Analysis of User Behaviors for Objectively Evaluating Spoken Dialogue Systems

Inoue, Koji, Lala, Divesh, Ochi, Keiko, Kawahara, Tatsuya, Skantze, Gabriel

arXiv.org Artificial IntelligenceJan-23-2024

Establishing evaluation schemes for spoken dialogue systems is important, but it can also be challenging. While subjective evaluations are commonly used in user experiments, objective evaluations are necessary for research comparison and reproducibility. To address this issue, we propose a framework for indirectly but objectively evaluating systems based on users' behaviors. In this paper, to this end, we investigate the relationship between user behaviors and subjective evaluation scores in social dialogue tasks: attentive listening, job interview, and first-meeting conversation. The results reveal that in dialogue tasks where user utterances are primary, such as attentive listening and job interview, indicators like the number of utterances and words play a significant role in evaluation. Observing disfluency also can indicate the effectiveness of formal tasks, such as job interview. On the other hand, in dialogue tasks with high interactivity, such as first-meeting conversation, behaviors related to turn-taking, like average switch pause length, become more important. These findings suggest that selecting appropriate user behaviors can provide valuable insights for objective evaluation in each social dialogue task.

dialogue, dialogue task, evaluation, (17 more...)

arXiv.org Artificial Intelligence

2401.04867

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.06)
Europe > Sweden (0.04)

Genre: Research Report > New Finding (0.50)

Industry: Health & Medicine (0.94)

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Real-time and Continuous Turn-taking Prediction Using Voice Activity Projection

Inoue, Koji, Jiang, Bing'er, Ekstedt, Erik, Kawahara, Tatsuya, Skantze, Gabriel

arXiv.org Artificial IntelligenceJan-9-2024

A demonstration of a real-time and continuous turn-taking prediction system is presented. The system is based on a voice activity projection (VAP) model, which directly maps dialogue stereo audio to future voice activities. The VAP model includes contrastive predictive coding (CPC) and self-attention transformers, followed by a cross-attention transformer. We examine the effect of the input context audio length and demonstrate that the proposed system can operate in real-time with CPU settings, with minimal performance degradation.

participant, real-time and continuous turn-taking prediction, transformer, (9 more...)

arXiv.org Artificial Intelligence

2401.04868

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.06)
Europe > Sweden (0.05)
Asia > Japan > Shikoku > Ehime Prefecture > Matsuyama (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Architecture > Real Time Systems (0.86)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Towards Objective Evaluation of Socially-Situated Conversational Robots: Assessing Human-Likeness through Multimodal User Behaviors

Inoue, Koji, Lala, Divesh, Ochi, Keiko, Kawahara, Tatsuya, Skantze, Gabriel

arXiv.org Artificial IntelligenceSep-25-2023

This paper tackles the challenging task of evaluating socially situated conversational robots and presents a novel objective evaluation approach that relies on multimodal user behaviors. In this study, our main focus is on assessing the human-likeness of the robot as the primary evaluation metric. While previous research often relied on subjective evaluations from users, our approach aims to evaluate the robot's human-likeness based on observable user behaviors indirectly, thus enhancing objectivity and reproducibility. To begin, we created an annotated dataset of human-likeness scores, utilizing user behaviors found in an attentive listening dialogue corpus. We then conducted an analysis to determine the correlation between multimodal user behaviors and human-likeness scores, demonstrating the feasibility of our proposed behavior-based evaluation method.

dialogue, robot, user behavior, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3610661.3617151

2308.1102

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.07)
Europe > France > Île-de-France > Paris > Paris (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report > New Finding (0.36)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.31)

Add feedback

I Know Your Feelings Before You Do: Predicting Future Affective Reactions in Human-Computer Dialogue

Li, Yuanchao, Inoue, Koji, Tian, Leimin, Fu, Changzeng, Ishi, Carlos, Ishiguro, Hiroshi, Kawahara, Tatsuya, Lai, Catherine

arXiv.org Artificial IntelligenceMar-17-2023

Current Spoken Dialogue Systems (SDSs) often serve as passive listeners that respond only after receiving user speech. To achieve human-like dialogue, we propose a novel future prediction architecture that allows an SDS to anticipate future affective reactions based on its current behaviors before the user speaks. In this work, we investigate two scenarios: speech and laughter. In speech, we propose to predict the user's future emotion based on its temporal relationship with the system's current emotion and its causal relationship with the system's current Dialogue Act (DA). In laughter, we propose to predict the occurrence and type of the user's laughter using the system's laughter behaviors in the current turn. Preliminary analysis of human-robot dialogue demonstrated synchronicity in the emotions and laughter displayed by the human and robot, as well as DA-emotion causality in their dialogue. This verifies that our architecture can contribute to the development of an anticipatory SDS.

artificial intelligence, laughter, natural language, (16 more...)

arXiv.org Artificial Intelligence

2303.00146

Country:

Europe > Germany > Hamburg (0.05)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.05)
Asia > Japan > Shikoku > Tokushima Prefecture > Tokushima (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.93)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.87)

Add feedback

Intelligent Conversational Android ERICA Applied to Attentive Listening and Job Interview

Kawahara, Tatsuya, Inoue, Koji, Lala, Divesh

arXiv.org Artificial IntelligenceMay-2-2021

Following the success of spoken dialogue systems (SDS) in smartphone assistants and smart speakers, a number of communicative robots are developed and commercialized. Compared with the conventional SDSs designed as a human-machine interface, interaction with robots is expected to be in a closer manner to talking to a human because of the anthropomorphism and physical presence. The goal or task of dialogue may not be information retrieval, but the conversation itself. In order to realize human-level "long and deep" conversation, we have developed an intelligent conversational android ERICA. We set up several social interaction tasks for ERICA, including attentive listening, job interview, and speed dating. To allow for spontaneous, incremental multiple utterances, a robust turn-taking model is implemented based on TRP (transition-relevance place) prediction, and a variety of backchannels are generated based on time frame-wise prediction instead of IPU-based prediction. We have realized an open-domain attentive listening system with partial repeats and elaborating questions on focus words as well as assessment responses. It has been evaluated with 40 senior people, engaged in conversation of 5-7 minutes without a conversation breakdown. It was also compared against the WOZ setting. We have also realized a job interview system with a set of base questions followed by dynamic generation of elaborating questions. It has also been evaluated with student subjects, showing promising results.

dialogue, intelligent conversational android erica applied, tatsuya kawahara, (12 more...)

arXiv.org Artificial Intelligence

2105.00403

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.05)
Asia > Japan > Shikoku > Ehime Prefecture > Matsuyama (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback